Designating eukaryotic orthology via processed transcription units

نویسندگان

  • Meng-Ru Ho
  • Wen-Jung Jang
  • Chun-houh Chen
  • Lan-Yang Ch'ang
  • Wen-chang Lin
چکیده

Orthology is a widely used concept in comparative and evolutionary genomics. In addition to prokaryotic orthology, delineating eukaryotic orthology has provided insight into the evolution of higher organisms. Indeed, many eukaryotic ortholog databases have been established for this purpose. However, unlike prokaryotes, alternative splicing (AS) has hampered eukaryotic orthology assignments. Therefore, existing databases likely contain ambiguous eukaryotic ortholog relationships and possibly misclassify alternatively spliced protein isoforms as in-paralogs, which are duplicated genes that arise following speciation. Here, we propose a new approach for designating eukaryotic orthology using processed transcription units, and we present an orthology database prototype using the human and mouse genomes. Currently existing programs cover less than 69% of the human reference sequences when assigning human/mouse orthologs. In contrast, our method encompasses up to 80% of the human reference sequences. Moreover, the ortholog database presented herein is more than 92% consistent with the existing databases. In addition to managing AS, this approach is capable of identifying orthologs of embedded genes and fusion genes using syntenic evidence. In summary, this new approach is sensitive, specific and can generate a more comprehensive and accurate compilation of eukaryotic orthologs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Greedy phylogeny-based orthology assignment and its application to the evolutionary analysis of metabolic coupling

Orthologous proteins descend from a common ancestral protein via a speciation event and often keep their ancestral functions. Therefore, orthology assignment is often applied to identify gene content and functions in newly sequenced species. No commonly accepted gold standard exists so far for orthology assignment. One reason for this is a preference of different evolutionary mechanisms in diff...

متن کامل

Comprehensive analysis of orthologous protein domains using the HOPS database.

One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains,...

متن کامل

Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes

Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale 'gold standard' orthology dataset. Even in the absence of such datasets, the comparison of r...

متن کامل

Operons in eukaryotes.

It was thought that polycistronic transcription is a characteristic of bacteria and archaea, where many of the genes are clustered in operons composed of two to more than ten genes. By contrast, the genes of eukaryotes are generally considered to be monocistronic, each with its own promoter at the 5' end and a transcription terminator at the 3' end; however, it has recently become clear that no...

متن کامل

mRNA maturation by two-step trans-splicing/polyadenylation processing in trypanosomes.

Trypanosomes are unique eukaryotic cells, in that they virtually lack mechanisms to control gene expression at the transcriptional level. These microorganisms mostly control protein synthesis by posttranscriptional regulation processes, like mRNA stabilization and degradation. Transcription in these cells is polycistronic. Tens to hundreds of protein-coding genes of unrelated function are array...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2008